2025-02-17 10:28:34.AIbase.15.4k
Microsoft Releases OmniParser V2.0: Converting Screenshots into Structured Formats for LLM Processing
Recently, Microsoft launched OmniParser V2.0, a new parsing tool designed to convert user interface (UI) screenshots into structured formats. OmniParser enhances the performance of UI agents based on large language models (LLM), helping users better understand and interact with the information on their screens. The tool's training dataset includes an interactive icon detection dataset, meticulously curated and automatically annotated from popular websites to highlight clickable and actionable areas.